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Abstract 

Numerous statistics have been proposed for the measure of offensive ability in major 
league baseball. While some of these measures may offer moderate predictive power 
in certain situations, it is unclear which simple offensive metrics are the most reliable 
or consistent. We address this issue with a Bayesian hierarchical model for variable 
selection to capture which offensive metrics are most predictive within players across 
time. Our sophisticated methodology allows for full estimation of the posterior distri- 
butions for our parameters and automatically adjusts for multiple testing, providing a 
distinct advantage over alternative approaches. We implement our model on a set of 50 
different offensive metrics and discuss our results in the context of comparison to other 
variable selection techniques. We find that 33/50 metrics demonstrate signal. However, 
these metrics are highly correlated with one another and related to traditional notions 
of performance (e.g., plate discipline, power, and ability to make contact). 
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1 Introduction 



/ don't understand. All of a sudden, it's not just BA and Runs Scored, it's OBA. 
And what is with O-P-S? - Harold Reynolds 



The past decade has witnessed a dramatic incre ase in intere st in baseball sta tistics, as evi- 
dence d by the popularity of the books Moneyball ( Lewis . 20031 ) and Curve Ball ( Albert and Bennet 



20031 ). Beyond recent public attention, the quanti t ative a nalysis of baseball c ontinues to be 
active area of sophisticated research (e.g. James ( 2008 ). Kahrl et al. ( 20091 )). Traditional 
statistics such as the batting average (BA) are constantly being suppl emented by mo re com- 



plicated modern metrics, su ch as the pow er hitting measure ISO ( Puerzerl . 
base-running measure SPD ( Jamesl . 1987 ). The goal of each measure remains the same: 



2003|) or the 



estimation of the true ability of a player on some relevant dimension against a background of 
inherent randomness in outcomes. This paper will provide a statistical framework for eval- 
uating the reliability of different offensive metrics where reliability is defined by consistency 
or predictive performance. 

Ther e has been s ubstantial previous research into measures of offensive performance in base- 
ball. ISilverl (120031 ) investigates the randomness of interseason batting average (BA) and finds 
significa nt mean reversion a mong players with unusually high batting averages in individual 
seasons. IStudemanl (j2007bl ) used s evera l players to investigate relationships between infield 



fly balls, line drives and hits. iNullI (120091 ) uses a sophisticated nested Dirichlet distribution to 
joint ly model fourte en batter measures and finds that statistical performance is mean revert- 
ing. Baumerl ( 2008 ) uses algebraic relationships to demonstrate the superiority of on-base 
percentage (OBP) over batting average (BA). 



Studemanl ( 2007a| ) considers four defense-independent pitching statistics (DIPS) for individ- 
ual batters: walk rate, strikeout rate, home-run rate, and batting average on balls in play 
(BABIP). These four measures form a sequence where each event is removed from the de- 
nominator of the next event, e.g., pl ayer can't s trike o ut if he walks, he cannot hit a home 



run if he walks or strikes out, etc. IStudemanl (l2007bl ) finds that the first three measures 



are quite consistent whereas the fourth measure, BA BIP, is quite n oisy. This BABIP mea- 
sure has been modified in many subsequent works. iLedererl (120091 ) considers BABIP and 
groundball outs in the 2007-08 seasons and concludes that h andedness and position (as a 



proxy for speed) are useful for predicting the two measures. iBrownl (120081 ) builds on this 
analysis by finding five factors that are predictive of BABIP and groundball outs: the ratio 
of pulled groundballs to opposite field groundballs, the percentage of grounders hit to center 
field , speed (Spd), bunt hits per plate appearance, and the ratio of home r uns to fly balls . 
Fair! (120081 ) analyzes the effects of age on various offensive metrics for hitters. iKaplanl (120061 ) 
decomposes several offensive statistics into both player and team level variation, and finds 
that player-level variation accounts for the large majority of observed variation. 

Our own contribution focusses on the following question: which offensive metrics are con- 
sistent measures of some aspect of player ability? We use a Bayesian hierarchical variable 
selection model to partition metrics into those with predictive power versus those that are 
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overwhelmed by noise. IScott and Bergerl (120061 ) use a similar variable selection approach 
to perform large-scale analysis of biological data. They provide a detailed exploration of 
the control of multiple testing that is provided by their Bayesian hierarchical framework, 
which is an advantage shared by our approach. We implement our model on 50 offensive 
metrics using MCMC methods. We present results for several parameters related to the 
within-player consistency of these offensive measures. We compa re our posterior inference 
to an alternative variable selection approach involving the Lasso (jTibshiranil . 119961 ). as well 
as performing a principal component analysis on our results. We find a large number of 
metrics (33/50) demonstrate signal and that there is considerable overlap with the results of 
the Lasso. However, these 33 metrics are highly correlated with one another and therefore 
redundant. Furthermore, many are related to traditional notions of performance (e.g., plate 
discipline, speed, power, ability to make contact). 



2 Methodology 

Our goal is a model that can evaluate offensive metrics on their ability to predict the future 
performance of an individual player based on his past performance. A good metric is one that 
provides a consistent measure for that individual, so that his past performance is indicative of 
his future performance. A poor metric has little predictive power: one would be just as well 
served predicting future performance by the overall league average rather than taking into 
account past individual performance. We formalize this principle with a Bayesian variable 
selection model for separating out players that are consistently distinct from the overall 
population on each offensive measure. In addition to providing individual-specific inferences, 
our model will also provide a global measure of the signal in each offensive measure. 



Our data comes from the (iKappelmanl . 120091 ) database. We have 50 available offense metrics 
which are outlined in Appendix |A] Our dataset contains 8,596 player-seasons from 1,575 
unique players spanning the 1974-2008 seasons. It is worth noting that the data for 10 of 
the 50 offensive metrics were not available before the 2002 season, and so for those metrics 
we fit our model on 1,935 player-seasons from 585 unique players^- For a particular offensive 
metric, we denote our observed data as y^ which is the metric value for player % during 
season j. The observed metric values y^ for each player is modeled as following a normal 
distribution with underlying individual mean (fi + on) and variance Wij ■ a 2 , 

yij ~ Normal(/i + a* , - a 2 ). (1) 

The parameter [i denotes the overall population mean (i.e., Major League Baseball mean) for 
the given offensive metric, and each on are the player-specific differences from that population 
mean \x. The weight term Wij addresses the fact that the variance of a season-level offensive 
metric for player i in season j is a function of the number of opportunities, and so player- 
seasons with more opportunities should have a lower variance. As an example, if the offensive 



lr These metrics are BUH, BUH/H, FB/BIP, GB/BIP, GB/FB, HR/FB, IFFB/FB, IFH, IFH/H, and 
LD/BIP 
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metric being modeled is on-base percentage (OBP), then the natural choice for the weight 
would the inverse of the number of plate appearances (PA). The weights used for each 
offensive metric are given in Appendix |A] With this formulation, the parameter a 2 represents 
the global variance of the offensive metric for player-seasons with an average number of 
opportunities. The global parameters \x and a 2 are unknown and are given the following 
prior distributions, 

\i ~ Normal(0, K 2 ) a 2 ~ Inverse — Gamma(a , Po) (2) 

where hyperparameter settings of K 2 = 10000, ao = .01, (3q = .01 were used to make 
these prior distributions non-informative. We also need to address our unknown player- 
specific parameters on. We could employ a conventional Bayesian random effects model which 
would place a Normal prior distribution shared by all oii parameters. Instead, we propose a 
more sophisticated model for the unknown individual Oj's that allows differentiation between 
players that are consistently different from the population mean from those that are not. 



2.1 Bayesian Variable Selection Model 

We formulate our sample of players as a mixture of (1) "zeroed" players where on = 
versus (2) "non-zeroed" players where oti ^ 0. We use the binary variable ji to denote the 
unknown group membership of each player i (ji = 4=> ctj = 0; 7$ = 1 4=> a, 7^ 0). We 
denote by p\ the unknown proportion of players that are in the non-zeroed group (7$ = 1) 
and use the prior distribution on ~ Normal(0, r 2 ) for them. For the players in the zeroed 
group, we have a point-mass at a* = 0. The variance parameter r 2 represents the deviations 
between individual players that have already been deemed to be different from the overall 
mean. When r 2 is large (particularly in relation to cr 2 ), this means that there is can be a 
potentially wide gulf between zeroed and non-zeroed players. 



George and McCullochl (119971 ) demonstrate that using a pure point-mass for a mixture com- 
ponent complicates model implementation. They suggest approximating the point-mass with 
a second normal distribution that has a much smaller variance, Vq ■ r 2 , where Vq is a hyper- 
parameter set to be quite small. In our model implementation, we set Vq = 0.01, meaning 
that the zeroed component has l/100th of the variance of the non- zeroed component. Thus, 
our mixture model on the player-specific parameters is 

/Normal(0, r 2 ) if 7i = 1 

Oti ~ S n (O) 

|Normal(0, v ■ r 2 ) if 7i = W 

We also illustrate this mixture in Figure [H The last two parameters of our model are r 2 and 
Pi. We give the following prior distribution to r 2 , 

r 2 ~ Inverse — Gamma^o, So). (4) 

where the hyperparameters are given v alues ifrn = -01 and So = .01 in order to be non- 



informative relative to the likelihood. iGelmanl (120061 ) cautions that the inverse-Gamma 
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Figure 1: Illustration of mixture for player-specific parameters oti. The black curve approximates a point 
mass at zero with a normal component that has a very small variance relative to the normal component for 
the non-zeroed players. 



family can actually be surprisingly informative for r 2 and suggests instead using a uniform 
prior on r, 

p(r) oc 1 =>- p(t 2 ) oc 1/t (5) 

We implemented this prior as a robustness check and found that our posterior results were 
nearly identical under these two different prior specifications. 

Finally, we allow the mixing proportion parameter p\ to be unknown with prior distribution 

Pi ~ Uniform(0, 1). (6) 



As discussed by Scott and Berger ( 20061 ). allowing pi to be estimated by the data provides an 



automatic control for multiple comparisons, which is an important advantage of our Bayesian 
methodology. Alternative approaches such as standard regression testing of individual means 
would require an additional adjustment for the large number of tests (1575 players) being 
performed. 

The mixing proportion pi is also an important model parameter for evaluating the overall 
reliability of an offensive metric, as it gives the probability that a randomly chosen player 
shows consistent differences from the population mean. Therefore, metrics with high signal 
should have a high p\. However, a high-signal metric should do more than fit an individual 
mean to a large fraction of players. It should also have little uncertainty about which specific 
players are in the "zeroed" and "non-zeroed" groups. We can evaluate this aspect of each 
metric by examining the uncertainty in the posterior distribution of the 7$ indicators for each 
player. Good metrics should show both a high p\ and a clear separation or consistency in 
the set of players assigned an individual mean and the set of players assigned the population 
mean. 



2.2 MCMC Implementation 



Let y be the vector of all player seasons y^ for the given offensive metric y. Similarly, 
let a and 7 denote the vectors of all cVs and all jiS respectively. The use of conjugate 
prior dis tributions outlined i n Sect ion [2] allows us to implement our model with a Gibbs 
sampler (IGeman and Gemanl (119841 )) where each step has a nice analytic form. Specifically, 
we iteratively sample from the following conditional distributions of each set of parameters 
given the current values of the other parameters. 
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1. Sampling yt, from p(fj,\a,a 2 ,y): 

Letting % index players and j seasons within a player, the conditional distribution for \x is 

/ Vij-oti \ 



/i\oc, a ,y ~ Normal 



\ 1,3 



J 



2. Sampling a from p(a|//,7, a 2 , r 2 ,y): 

Again letting i index players and j seasons within a player, the conditional distribution for 
each a,- is 



^ Ulio'-CT 2 



ai|/x,7i,a 2 ,r 2 ,|/ ~ Normal 



\ 



\ J 3 



J 



where r 2 = r 2 if 7, = 1 or r 2 = t> • r 2 if 7, = 0. 
3. Sampling a 2 from p(cr 2 |/x, a,y): 

Letting iV be the total number of all observed player-seasons, the conditional distribution 
for o 2 is 

a 2 \/j,,a,y ~ Inv - Gamma ( a + ^ , f3 + Y] ^ °^ — — ) 



4. Sampling r 2 from p(r 2 |o;): 

Letting m be the number of players, the conditional distribution for r 2 is 
r 2 \a ~ Inv — Gamma [ ipo + — , 5o + — — 



where Vi — 1 when 7$ = 1 and = vq if 7$ = 0. 

5. Sampling 7 from p(7|a, r 2 ,pi): 

We sample each 7$ as a Bernoulli draw with probability 



p(t< = il"*^ = — 



Pi " ex P I 



/l>0 



" ex P (-2^) +Pi" ex P (~^r) 



5. Sampling p 1 from p(pi|7): 



The mixing proportion pi has the conditional distribution 

Pipy ~ Beta I l + ^ 7i , 1 + ^(1 

\ i i 

For each offensive metric, we run our Gibbs sampler for 60,000 iterations and discard the 
first 10,000 iterations as burn-in. The remainder of the chain is thinned to retain every 50th 
iteration in order to eliminate autocorrelation of the sampled values. We present our results 
from our estimated posterior distributions in Section [3j 




3 Results 



We implemented our Bayesian variable selection model on the 50 available offense metrics 
outlined in Appendix [A] However, we first examined the observed data distribution for 
each of these metrics to see if our normality assumption ([1]) is reasonable. The majority of 
offensive metrics (36/50) have data distributions that are approximately normal. However, 
a smaller subset of metrics (14/50) do not have an approximately normal distribution but 
rather exhibit substantial skewnessj Examples are triples (3B) and stolen bases (SB) where 
the vast majority of players have very small values but there also exists a long right tail 
consisting of a small number of players with much larger values. The large proportion of zero 
values also makes many of these metrics less amenable to transformation. We proceeded to 
to implement our model on all 50 measures, but in the results that follow we will differentiate 
between those measures that fit the normality assumption versus those that do not. 



3.1 Evaluating Signal in Each Offensive Measure 

As discussed in Section 12.1} there are two aspects of our posterior results which are relevant 
for evaluating the overall signal in an offensive metric. The first aspect is the proportion of 
players pi that have individual means which differ from the population mean. If a metric 
has low signal, the population mean has similar predictive power to an individual mean. 
We prefer metrics where the individual mean has much more predictive power than the 
population mean for most players and a good proxy for this characteristic is pi. Thus, for 
each metric, we calculate the posterior mean pi of the pi parameter. 

In addition to having a large number of players with individual means (large pi), we also 
want a metric that has high certainty about which players are consistently different from the 
overall mean. Thus, the second aspect of our posterior results that we use to evaluate each 
metric is the amount of uncertainty in the posterior distributions of the player-specific 7, 
indicators. Specifically, a good metric should have posterior estimates where P(7, = 1) ~ 

2 These metrics are 3B, 3B/PA, BUH, BUH/H, CS, CS/OB, HBP, HDP/PA, IBB, IBB/PA, SB, SB/OB, 
SBPA, and SH 
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Figure 2: Plots of pi (y-axis) against the negative entropy. On the right hand plot, we zoom in on the 
subset of the data enclosed by the dotted rectangle in the upper right portion of the left plot and jitter the 
points for visibility. 



or P(7i = 1) ~ 1 for many players i. A good global summary of this aspect of our model is 
the negative entropy —H, 

- m 

-H = - Y\[li Mfi) + (1 - fO log(l - li)} 



m 



where % is the fraction of posterior samples where 7« = 1 for player i. The negative entropy 
is maximized (at —H = 0) when each is either or 1, which is the ideal situation for a 
consistent offensive metric. Correspondingly, the negative entropy is minimized when each 
7i ~ 0.5 which suggests large uncertainty about each player. 

On the left-hand side of Figure [2l we plot pi against the negative entropy —H for our 50 
offensive measures. Metrics colored in red were the majority that were reasonably approx- 
imated by a normal distribution, whereas metrics colored in black were not. We see three 
groupings of metrics in the left-hand side of Figure [2J We see a cluster of black (non- normal) 
metrics with low values of p\ which look to be poor metrics by our evaluation but are also 
a poor fit to our model assumptions. We also see a cluster of mostly red (normal) metrics 
that show intermediary p± values but low values of the negative entropy. These red metrics 
meet our model assumptions but the results indicate that they are poor metrics in terms of 
within-player consistency. The most interesting cluster are the predominantly red metrics 
that have both large pi and large values of negative entropy. These are the metrics that 
seem to perform well in terms of our evaluation criteria, and we zoom in on this group on 
the right-hand side of Figure [2j Within this group of 33 high signal metrics, we also see a 
fairly strong relationship between pi and negative entropy. 

Several of the best metrics suggested by our results are K/PA, Spd, ISO, BB/PA, and 
GB/BIP. The set of these best metrics spans several different aspects of hitting. K/PA 
and BB/PA are all related to plate discipline, Spd represents speed, ISO measures hitting 
power, and G B/BIP cap t ures th e tendency to hit ground balls. These findings are partly 



supporte d by IStudemanl (l2007aT) who finds that K rate and BB rate are very consistent. 
However, IStudemanl (j2007al ) also found that BABIP was a low signal metric, whereas our 
evaluation places BABIP among the high signal metrics. In addition to isolating a subset of 
good metrics, our evaluation based on both p\ and negative entropy provides a continuum 
in the right-hand plot of Figure [2] upon which we can further examine these good metrics. 
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ISO - Isolated Power 


BB (walk) rate 


Player 


Mean (// + «j) 


Player 


Mean (fi + a,) 




Estimate 


SD 




Estimate 


SD 


Mark McGwire 


0.320 




0.010 


Barry Bonds 


0.204 


0.004 


Barry Bonds 


0.304 




0.008 


Gene Tenace 


0.186 


0.007 


Ryan Howard 


0.293 




0.016 


Jimmy Wynn 


0.183 


0.010 


Jim Thome 


0.287 




0.009 


Ken Phelps 


0.176 


0.011 


Albert Pujols 


0.281 




0.011 


Jack Cust 


0.176 


0.012 


Population Mean fi = 


0.142 


Population Mean fi = O.Ot 


17 


Spd - 


- Speed 






K (strikeout) rate 


Player 


Mean (fi + a>i) 


Player 


Mean (fi + a>i) 




Estimate 


SD 




Estimate 


SD 


Vince Coleman 


8.55 




0.30 


Jack Cust 


0.388 


0.018 


Jose Reyes 


8.22 




0.40 


Russell Branyan 


0.376 


0.021 


Carl Crawford 


8.14 




0.36 


Melvin Nieves 


0.371 


0.020 


Willie Wilson 


8.13 




0.25 


Rob Deer 


0.351 


0.010 


Omar Moreno 


7.89 




0.31 


Mark Reynolds 


0.347 


0.018 


Population 


Mean fi = 


4.11 


Population Mean fi = 0.166 



Table 1: Top players for four high signal metrics. For each player, we provide the posterior estimate and 
posterior standard deviation for their individual mean (fi + cti). The estimated % was equal to 1.00 for each 
of these cases. The posterior estimate of the population mean fi is also provided for comparison. 

3.2 Examining Individual Players 

Four metrics found by our model to be high signal were ISO, BB rate, Spd and K rate. Each 
of these metrics measures a different aspect of offensive ability: ISO relates to hitting power, 
BB rate relates to plate discipline, Spd relates to speed, and K rate relates to the ability to 
make contact. We further explore our results by focusing on the top individual players for 
each of these measures, as estimated by our model. In Table [TJ we show the top five players 
in terms of their estimated individual means (fi + «j) for each of these metrics. 

For the isolated power (ISO) metric, each of the top five players are well-known hitters that 
have led the league in home runs at least once during their careers. Even more striking is 
the magnitude of their estimated individual means (fi + ojj), which are more than double 
the population mean fi = 0.142. Barry Bonds appears in the top 5 baseball players for both 
ISO and BB rate, and more generally, there is fairly strong correspondence between these 
two metrics beyond the results of Table [TJ This finding suggests that there is correlation 
between the skills that determine a batters plate discipline and the skills that lead to hitting 
for power. Other well-known players ranking high on BB rate (but outside of the top 5) 
are Jim Thome, Mark McGwire, Frank Thomas, and Adam Dunn. Barry Bonds does stand 
out dramatically with a walk rate that is almost 2% higher than the next highest player, 
the equivalent of 12-13 extra walks per season. This difference seems especially substantial 
when taking into account the small standard deviation (0.4%) in Bonds' estimated mean. 
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Jack Cust appears in the top 5 baseball players for both BB rate and K rate, which is 
especially interesting since having high BB rate is beneficial whereas having a high K rate 
is detrimental. However, it is not particularly surprising, since players with good plate 
discipline will frequently be in high count situations that can also lead to strike outs. Cust 
is especially well-known for having a "three-outcome" (i.e. walk, strikeout or home run) 
approach. Moving beyond the top 5 players, other power hitters such as Ryan Howard, 
Adam Dunn, and Jim Thome also exhibit high K rates. The top players on Bill James' 
speed metric Spd are a much different set of players than the previous three metrics. The 
highest estimated individual mean is held by former Rookie of the Year Vince Coleman, who 
led the National League in stolen bases from 1985 to 1990. 

A general theme of all four metrics examined in Table [1] is that there is consistency within 
players, as indicated by the relatively small standard deviations, but clear evidence of sub- 
stantial heterogeneity between players since the top players are estimated to have such a 
large deviation from the population mean. These two factors are an ideal combination for a 
high signal offensive metric. 



4 External Validation and Principal Components 

In the next subsection, we compare our results to an alternative variable selection approach 
based upon the Lasso. We then explore the correlation between offensive metrics with a 
principal component analysis. 



4.1 Comparison to the Lasso 



The Lasso ( Tibshirani ( 19961 )) is a penalized least squares regression that uses an L\ penalty 
on the estimated regression coefficients, 



0. 



Lasso 



argmm 

$ 



J2(vij - xSf + a 53 1 a 



A > 



(7) 



The Lasso implementation enforces sparsity on the covariate space by forcing some coeffi- 
cients to zero, and can therefore be used for variable selection. A more intuitive reformulation 



< /, where f3\ 



OLS 



of the Lasso is as a a minimization of -(y^ — Xi(5) 2 subject to ^ols\ 
is the coefficient from variable i in the ordinary least squares solution. The free parameter 
/ is known as the Lasso "fraction" and corresponds to A in Equation [71 / ranges between 
zero (corresponding to fitting only an overall mean or A = oo in Equation [7J and one (cor- 
responding to the ordinary least squares regression solution or A = in Equation [TJ. 

We apply the Lasso to our problem by centering each offensive metric and then fitting the 
regression model consisting only of indicators for each player. Each component of the f3 
vector corresponds to the individual mean of given player, and we are interested in which 
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Figure 3: Left: Plot of pi (y-axis) against the percentage of players with non-zero means selected by 
the Lasso. Right: Plot of percentage of players with non-zero means selected by the Lasso (y-axis) versus 
negative entropy 



of these individual means are fitted to be different from zero. To select a value of the free 
parameter /, we implemented multiple 5-fold cross validation (CV) by randomly subdividing 
all player-seasons into 5 groups and repeating ten times. We fit the Lasso on 4 groups and 
predict out-of -sample on the remaining player-seasons. This analysis is repeated for a fine 
grid of possible / values ranging between and 1, and we selected the / with the lowest 
cross-validated average RMSE. We then fit the Lasso model using this value of /. 

The outcome of interest from this Lasso regression is Lasso%, the percentage of players that 
are fitted with non-zero coefficients by the Lasso implementation. This measure represents 
a global measure of signal for each metric, and thus serves as an alternative to our model- 
based measures of pi and the negative entropy. We compare our model-based measures to 
the Lasso% measure in Figure El In the right plot of Figure El we see no real structure 
to the relationship between the negative entropy and the Lasso%. This is not surprising 
considering that the negative entropy is a measure of the variability in our own model-based 
results, which is not an equivalent measure to the Lasso%. 

More interesting is the comparison of p\ and Lasso% as these two measures are more inti- 
mately related. In the left plot of Figure El we see agreement between Lasso% and pi for 
many measures, especially the red measures that fit the normal model. These high signal 
measures with large value of pi also tend to have a large percentage of non-zero coefficients. 
The main difference between the two methods is with the black metrics that have skewed 
(non-normal) data distributions. These measures tend to have a high Lasso% but a low 
Pi, meaning that a Lasso-based analysis would attribute much more signal to these metrics 
than our mixture model-based analysis. Neither our model nor the Lasso is meant for the 
highly skewed data of these black metrics. The fact that the results from our model is more 
cautious about these metrics than the Lasso results suggests an advantage to our approach. 



4.2 Principal Components Analysis 

Among our metrics inferred to have high signal (right-hand plot of Figure [2]), we see a 
broad and continuous spectrum of performance on both pi and negative entropy. Ideally, 
there would be a more stark divide in the performance of these metrics, allowing us to 
focus on only a small subset of metrics as a complete summary of offensive performance. 
However, this is a difficult task in large part because of the high correlation between many 
of these metrics. As an obvious example, OPS is a linear combination of OBP and SLG. 
We performed a more systematic assessment of the correlation between metrics using a 
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Figure 4: Plots of the variance explained by each principal component for all metrics (left), for the high 
signal metrics (middle), and for the remaining metrics (right). For each plot we create a grey null band by 
randomly permuting the values within each column to demonstrate the strong significance of our results. In 
addition, we demonstrate the variability in our own principal components by creating bootstrap samples of 
player seasons and calculating the variance of the bootstrap principal components in red. 



Principal Components Analysis. PCA projects the data onto an orthogonal space such that 
each orthogonal component describes a decreasing amount of variance. 

Note that one of our 50 metrics, SBPA, was not included in this analysis due to a high 
number of player-seasons which had a denominator (SB+CS) equal to zero. The results 
from our principal components analysis on the remaining 49 metrics are shown in Figure HI 
We see that among the 49 metrics represented in the left-hand plot of Figure HI only about 
eight principal components have variance exceeding the null bands, which suggests that there 
are only about eight unique (orthogonal) metrics among the entire set of metrics. 

As a further consideration, we computed the principal components of 32 of the 33 high 
signal metrics^ as well as the principal components of the remainder of the metrics. If our 
hypothesis that much of the signal in the data comes from eight significant components, 
then we should expect to see about eight signicant principal components for the first set 
of data with a shape similar to that in the left panel of Figure HI on the other hand, for 
the remaining metrics, we expect to see fewer significant principal components as well as a 
curve which is less steep. As the middle panel of Figure H] shows, there are indeed about six 
or seven significant principal components in the set of 32 metrics. This means much of the 
signal in all 50 metrics is contained in the 32 (and, in fact, that there are only really about 
six or seven truly different ones among those 32). Furthermore, the 17 noisy metrics contain 
substantially less signal, as shown by the right panel of Figure HI 



5 Discussion 

We have introduced a hierarchical Bayesian variable selection model, which allows us to 
determine the ability of metrics for hitting performance to provide sound predictions across 
time and players. Our model does not require adjustment for multiple testing across play- 
ers and imposes shrinkage of player-specific parameters towards the population mean. For 
50 different offensive metrics, the full posterior distributions of our model parameters are 
estimated with a Gibbs sampling implementation. We evaluate each of these metrics with 
several proxies for reliability or consistency, such as the proportion of players found to differ 
over time from the population mean as well as the posterior uncertainty (in terms of negative 
entropy) of this deviation from the population mean in individual players. 

3 Again, excluding SBPA. 
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We find clear separation between 17 metrics with essentially no signal and 33 metrics that 
range along a continuum of having lesser to greater signal. The existence of this continuum 
is largely a function of the high correlation between each of these metrics. Our principal 
components analysis suggests that only a small subset of metrics are substantively different 
from one another. 

A direction of future research would be the creation of a reduced set of consistent metrics 
that were less highly dependent but still directly interpretable. We also plan to apply 
our methodology to multiple measures of pitching performance, which we anticipate will 
be less highly correlated with each other. Our sophisticated hierarchical model could be 
extended further to share information between metrics instead of the separate metric-by- 
metric analysis that we have performed. The relatively high correlation between some metrics 
could be used to cluster metrics together and reduce dimensionality. This approach would 
have the advantage of pooling across related measures and increasing effective degrees of 
freedom. 
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A Offensive Measures 

Our 50 offensive measures are subdivided below into categories for ease of presentation. 
Several terms that are not defined in the tables themselves are AB (at bats), BIP (balls 
in play), OB (total number of times on base), PA (plate appearances), and PA* (plate 
appearances minus sacrifice hits). 
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1. oimple 


hitting 


totals and rates 








Metric y^ 


Weight 


Wij Description 


Metric y^ 


Weight Wij 


Description 


IB 


PA 


singles 


IB/PA 


PA 


single rate 


2B 


PA 


doubles 


2B/PA 


PA 


double rate 


3B 


PA 


triples 


3B/PA 


PA 


triple rate 


HR 


PA 


home runs 


HR/PA 


PA 


home run rate 


R 


PA 


runs 


R/PA 


PA 


run rate 


RBI 


PA 


runs batted in 


RBI/PA 


PA 


runs batted in rate 


BB 


PA 


base on balls (walk) 


BB/PA 


PA 


walk rate 


IBB 


T> A 
PA 


intentional walk 


1 r i | > /t-> A 

IBB / PA 


PA 


intentional walk rate 


K 


PA 


strike outs 


K/PA 


PA 


strike out rate 


HBP 


PA 


hit by pitch 


HBP/PA 


PA 


hit by pitch rate 


BUH 


H 


bunt hits 


BUH/H 


H 


bunt hit proportion 


H 


PA 


hits 


GDP 


PA 


ground into double play 


SF 


PA 


sacrifice fly 


SH 


PA 


sacrifice hit 



2. More complicated hitting totals and rates 



Metric y^ 


Weight 


Description 


OBP 


PA* 


on base percentage (OB/PA*) 


AVG 


AB 


batting average (H/AB) 


SLG 


AB 


slugging percentage 


OPS 


AB x PA* 


OPB + SLG 


ISO 


AB 


isolated power (SLG- AVG) 


BB/K 


PA 


walk to strikeout ratio 


HR/FB 


PA 


home run to fly ball ratio 


GB/FB 


BIP 


ground ball to fly ball ratio 


BABIP 


BIP 


batting average for balls in play 


LD/BIP 


BIP 


line drive rate 


GB/BIP 


BIP 


ground ball rate 


FB/BIP 


BIP 


fly ball rate 


IFFB/FB 


FB 


infield fly ball proportion 


IFH 


GB 


in field hit 


IFH/H 


GB 


in field hit proportion 


wOBA 


PA* 


weighted on base average 


wRC 


PA 


runs created based on wOBA 


wRAA 


PA 


runs above average based on wOBA 



3. Baserunning totals and rates 



Metric y^ 


Weight w^ 


Description 


Metric 


Weight 


Description 


SB 


OB 


stolen bases 


SB/OB 


OB 


stolen base rate 


cs 


OB 


caught stealing 


CS/OB 


OB 


caught stealing rate 


SBPA 


SB + CS 


stolen bases per attempt 


Spd 


PA 


Bill James' speed metric 






i.e., SB/(SB+CS) 
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